Repairing partially completed transactions in fast consensus protocol

ABSTRACT

In an approach, a processor detects a transmission control protocol disconnection of a first distributed storage unit from a distributed storage network, wherein the distributed storage network comprises a set of distributed storage units. A processor identifies a transaction, wherein: the transaction is not in a final state, the transaction is a first proposal, from the first distributed storage unit, for the set of distributed storage units to store a dataset with a first revision number within the distributed storage network, and the dataset is broken into one or more data pieces to be written on the set of distributed storage units of the distributed storage network that approve the proposal. A processor identifies a timestamp of the transaction. A processor determines a stage the transaction has reached. A processor places the transaction in a final state based on the determined stage the transaction has reached.

BACKGROUND

The present invention relates generally to the field of dispersedstorage computer networks, and more particularly to repairing partiallycompleted transactions from fast consensus protocol in dispersed storagecomputer networks.

A dispersed, or distributed, storage computer network (DSN) is acomputer network where information is stored on more than one node or DSunit within the network, often in a replicated fashion. In a DSN, whenmore than one proposer attempts to update the same revision/version of adata source at the same time, contention arises. The proposers in thiscase are generally two or more DS processing units attempting to updatethe same data source (segment or data source representing the meta-dataobject) at the same time. Since strong consistency is a desirableproperty for a DSN, some form of consensus protocol is generallyrequired. Typically, if all actors in the system follow such a protocol,strong consistency is insured. In addition, it is advantageous forconsensus to be achievable in a single round-trip in the contented casewhile performing acceptably.

Protocols, such as Paxos, were developed to establish a distinguishedclient using an out-of-band process to deal with consensus issues. Agoal of Paxos is for some number of peers to reach an agreement on avalue; Paxos guarantees that if one peer believes some value has beenagreed upon by a majority, the majority will never agree on a differentvalue. The protocol is designed such that any agreement must go througha majority of nodes. The out-of-band process of Paxos may add overheadto the DSN because round trips may be required. Strong consistencyproperties are achieved during an overwrite of a specific revision orthe initial right of some sort's name in DSN memory (sometimes called “acontest”). Multiple DS processing units participate in the same contestif they attempt to update the same revision of a data source stored inthe same DSN memory. A variant of Paxos, called “Fast Paxos” wasdeveloped to enable consensus to be established in a single networkround trip time (RTT) in the ideal case of no contention. Fast Paxosgenerally has separate distinct phases for: 1) electing a leader; and 2)proposing new values for consensus.

SUMMARY

Aspects of an embodiment of the present invention disclose a method,computer program product, and computer system. A processor detects atransmission control protocol disconnection of a first distributedstorage unit from a distributed storage network, wherein the distributedstorage network comprises a set of distributed storage units. Aprocessor identifies a transaction, wherein: the transaction is not in afinal state, the transaction is a first proposal, from the firstdistributed storage unit, for the set of distributed storage units tostore a dataset with a first revision number within the distributedstorage network, and the dataset is broken into one or more data piecesto be written on the set of distributed storage units of the distributedstorage network that approve the proposal. A processor identifies atimestamp of the transaction. A processor determines a stage thetransaction has reached. A processor places the transaction in a finalstate based on the determined stage the transaction has reached.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic block diagram of an embodiment of a dispersed, ordistributed, storage network (DSN), in accordance with an embodiment ofthe present invention.

FIG. 2 is a flowchart depicting operational steps of a rebuilder unitexecuting within the dispersed storage network (DSN) of FIG. 1, inaccordance with an embodiment of the present invention.

FIG. 3 is a schematic block diagram of components of a computing deviceof DSN of FIG. 1, in accordance with an embodiment of the presentinvention.

DETAILED DESCRIPTION

Embodiments of the present invention recognize that fast consensusprotocols reach consensus in a single round-trip, such as Fast Paxos andFast Paxos with Implicit Leader Election (FPILE). While it isadvantageous for consensus to be achievable in a single round-trip, thedownside of a single round-trip consensus is a lack of a second “commit”phase. This creates a problem within DS systems that work via erasurecoding of causing there to be no natural place/time to clean up apartially completed transaction when a DS processing unit driving thetransaction (i.e., a proposer) crashes at any point before successfullycompleting the transaction.

A transaction is the overall process taken to obtain a consensus from DSunits within a DS network (DSN) to encode data via erasure coding ormake a revision to data encoded via erasure coding to the DS unitswithin the DSN. Erasure encoding breaks data into a width or number ofpieces with some redundancy, such that if any threshold of those piecesare obtained, the original data can be reconstructed. Typically, eachwidth piece is stored on a different storage node (i.e. DS unit). Aproposer initiates a transaction by sending out a “propose” message(i.e. write request) to each DS unit in which a width piece of datacould be encoded and stored on (i.e. a width of DS units). A proposer'swrite request succeeds if some write threshold of pieces aresuccessfully written to the width of DS units. The write threshold isthe number of DS units that must accept the proposal made by theproposer for that proposal to achieve consensus, where the writethreshold can be less than the width. When the transaction issuccessful, there is a finalize phase that removes any previousrevisions or competitor's proposals that are no longer required.

Since strong consistency is a top priority for a DSN, there is nocorrectness concern with leaving partially completed transactions on theDSN memory. Embodiments of the present invention recognize that leavingpartially completely transactions on the DSN memory unnecessarilyconsumes space resources in DSN memory. In this manner, as discussed ingreater detail herein, embodiments of the present invention provide arebuilder DS unit that identifies partially completed transactions andtakes appropriate action, depending on what stage of the transaction theproposer was at when it failed, to leave the transaction in a finalstate. Appropriate action may include conducting the finalize phase inwhich previous revisions and competing proposals are removed.

Embodiments of the present invention further recognize the difficulty indetermining when a proposer has crashed and/or failed, so a rebuilder DSunit cleaning up partially completed transactions must proceed in a waythat won't allow a proposer to believe the proposer succeeded when theproposer actually failed at some stage of completing a transaction.

The present invention will now be described in detail with reference tothe Figures.

FIG. 1 is a schematic block diagram of an embodiment of a dispersed, ordistributed, storage network (DSN) 100. FIG. 1 provides only anillustration of one embodiment and does not imply any limitations withregard to the environments in which different embodiments may beimplemented. In the depicted embodiment, DSN 100 includes a plurality ofDS units in the form of computing devices 110 i through 110 n,management unit 120, rebuilder unit 130, and DSN memory 150. DSN 100 mayinclude additional DS units in the form of computing devices, servers,user devices, or other devices not shown.

The components of DSN 100 are coupled to network 140. Network 140 may bea local area network (LAN), a wide area network (WAN) such as theInternet, the public switched telephone network (PSTN), a mobilebroadband network, such as a 4G and Long Term Evolution (LTE), anycombination thereof, or any combination of connections and protocolsthat will support communications between computing devices 110 i through110 n, management unit 120, rebuilder unit 130, and DSN memory 150, inaccordance with embodiments of the invention. Network 140 may includewired, wireless, or fiber optic connections.

Computing devices 110 i through 110 n operate as DS processing unitsthat may be a portable computing device and/or a fixed computing device.Each of computing devices 110 i through 110 n are capable of being aproposer of a transaction A portable computing device may be a socialnetworking device, a gaming device, a cell phone, a smart phone, adigital assistant, a digital music player, a digital video player, alaptop computer, a handheld computer, a tablet, a video game controller,and/or any other portable device that includes a computing core. A fixedcomputing device may be a computer (PC), a computer server, a cableset-top box, a satellite receiver, a television set, a printer, a faxmachine, home entertainment equipment, a video game console, and/or anytype of home or office computing equipment. Note that management unit120 and rebuilder unit 130 may be separate computing devices, may be acommon computing device, and/or may be integrated into one or more ofthe computing devices 110 i through 110 n and/or into one or more of thestorage units 152 i through 152 n. Computing devices 110 i through 110n, management unit 120, and rebuilder unit 130 may include components asdepicted and described in further detail with respect to FIG. 3.

Each of computing devices 110 i through 110 n, management unit 120, andrebuilder unit 130 include an interface, as seen in FIG. 1. Eachinterface includes software and hardware to support one or morecommunication links via network 140 indirectly and/or directly. Each ofcomputing devices 110 i through 110 n are capable of communicating withDSN memory 150, and therefore, are capable of being a proposer of atransaction.

DSN memory 150 includes a plurality of DS units, such as storage units152 i through 152 n, that may be located at geographically differentsites (e.g., one in Chicago, one in Milwaukee, etc.), at a common site,or a combination thereof. For example, DSN memory 150 includes eightstorage units and each storage unit is located at a different site. Asanother example, DSN memory 150 includes eight storage units and alleight storage units are located at the same site. As yet anotherexample, DSN memory 150 includes eight storage units and a first pair ofstorage units are at a first common site, a second pair of storage unitsare at a second common site, a third pair of storage units are at athird common site, and a fourth pair of storage units are at a fourthcommon site. Note that DSN memory 150 may include any number of storageunits.

In operation, management unit 120 performs DS management services. Forexample, management unit 120 establishes distributed data storageparameters (e.g., vault creation, distributed storage parameters,security parameters, billing information, user profile information,etc.) for computing devices 110 i through 110 n individually or as partof a group of user devices. As a specific example, management unit 120coordinates creation of a vault (e.g., a virtual memory block associatedwith a portion of an overall namespace of DSN 100) within DSN memory 150for a computing device, a group of devices, or for public access andestablishes per vault dispersed storage (DS) error encoding parametersfor a vault. Management unit 120 facilitates storage of DS errorencoding parameters for each vault by updating registry information ofDSN 100, where the registry information may be stored in DSN memory 150,computing devices 110 i through 110 n, management unit 120, and/or therebuilder unit 130.

Management unit 120 creates and stores user profile information (e.g.,an access control list (ACL)) in local memory and/or within memory ofDSN memory 150. The user profile information includes authenticationinformation, permissions, and/or the security parameters. The securityparameters may include encryption/decryption scheme, one or moreencryption keys, key generation scheme, and/or data encoding/decodingscheme.

Management unit 120 creates billing information for a particular user, auser group, a vault access, public vault access, etc. For instance, themanagement unit 120 tracks the number of times a user accesses anon-public vault and/or public vaults, which can be used to generate aper-access billing information. In another instance, the management unit120 tracks the amount of data stored and/or retrieved by a user deviceand/or a user group, which can be used to generate a per-data-amountbilling information.

As another example, management unit 120 performs network operations,network administration, and/or network maintenance. Network operationsincludes authenticating user data allocation requests (e.g., read and/orwrite requests), managing creation of vaults, establishingauthentication credentials for user devices, adding/deleting components(e.g., user devices, storage units, and/or computing devices) to/fromDSN 100, and/or establishing authentication credentials for storageunits 152 i through 152 n. Network administration includes monitoringdevices and/or units for failures, maintaining vault information,determining device and/or unit activation status, determining deviceand/or unit loading, and/or determining any other system level operationthat affects the performance level of DSN 100. Network maintenanceincludes facilitating replacing, upgrading, repairing, and/or expandinga device and/or unit of DSN 100.

Rebuilder unit 130 performs rebuilding of ‘bad’ or missing encoded datapieces caused by a proposer crashing before completing a transaction. Inan embodiment, rebuilder unit 130 performs this rebuilding byidentifying partially completed transactions and taking appropriateaction, depending on what stage of the transaction the proposer was atwhen it failed and/or crashed, to leave the transaction in a finalstate. For example, if the proposer crashed before consensus was reachedand consensus is reached, rebuilder unit 130 completes the finalizephase by removing previous revisions and competing proposals. Anadvantage to ensuring that this finalize phase is completed is thefinalize phase allows subsequent readers to read an agreed item of theproposal without verifying that the original proposal was accepted by awrite threshold of DS units.

In an embodiment, rebuilder unit 130 can detect that a proposer hascrashed when the proposer used a TCP connection to communicate with awidth of DS units within DSN 100 and rebuilder unit 130 identifies adisconnection of the TCP connection. In an embodiment, rebuilder unit130 can identify a partially completed transaction and an associatedtimestamp. In an embodiment, rebuilder unit 130 can determine what stagethe transaction has reached before the disconnection and based on thatdetermination can take appropriate action to leave the transaction in afinal state. The process rebuilder unit follows to clean up partiallycompleted transactions is described in more detailed in FIG. 2.

In the depicted embodiment, rebuilder unit 130 is a separate unit withinDSN 100 with interface connection to all components of DSN 100 throughnetwork 140. In another embodiment, rebuilder unit 130 may resideelsewhere within DSN 100 such as within one of the computing devices orstorage units, provided rebuilder unit 130 has access to network 140.Rebuilder unit 130 is described in further detail with respect to FIG.2.

FIG. 2 is a flowchart depicting operational steps of rebuilder unit 130executing within DSN 100 of FIG. 1, in accordance with an embodiment ofthe present invention. In the depicted embodiment, rebuilder unit 130operates to identify partially completed transactions and takeappropriate action, depending on what stage of the transaction theproposer was at when it failed and/or crashed, to leave the transactionin a final state. It should be appreciated that the process depicted inFIG. 2 illustrates one possible execution of rebuilder unit 130.

In step 205, rebuilder unit 130 detects a disconnection of a proposer.In an embodiment, rebuilder unit 130 detects a disconnection by aproposer who used a TCP connection to communicate with a width of DSunits including rebuilder unit 130 within DSN 100. In some embodiments,rebuilder unit 130 does not detect a disconnection and the processbegins at step 210.

In step 210, rebuilder unit 130 identifies a partially completedtransaction. In an embodiment, rebuilder unit 130 identifies a partiallycompleted transaction on a DS unit. In several embodiments, rebuilderunit 130 is a DS unit with the width of DS units. In an embodiment,rebuilder unit 130 identifies a partially completed transaction on a DSunit within the width of DS units, including rebuilder unit 130, theproposer had established a TCP connection with and then rebuilder unit130 detected the disconnection. In an embodiment, rebuilder unit 130identifies a partially completed transaction by identifying a piece ofdata on the DS unit. In an embodiment, rebuilder unit 130 identifies apartially completed transaction by searching one or more DS units. In anembodiment, rebuilder unit 130 searches one or more DS units and keeps alist of proposals that have not been finalized. In another embodiment,in which rebuilder unit 130 is within the width of DS units involved inthe transaction, rebuilder unit 130 knows when the transaction has notbeen finalized and checks the status of the transaction that has notbeen finalized by issuing read requests to the other width DS units. Inyet another embodiment, rebuilder unit 130 issues read requests onlywhen the width of DS units has over a threshold number of non-finalizedtransactions.

In step 215, rebuilder unit 130 identifies a timestamp of the partiallycompleted transaction. In an embodiment, rebuilder unit 130 identifies atimestamp included in the metadata of the transaction by the proposer.In an embodiment, rebuilder unit 130 will only clean up the partiallycompleted transaction if the timestamp meets or surpasses apredetermined threshold. For example, rebuilder unit 130 will clean upthe partially completed transaction with a timestamp of five minutes ormore.

In step 220, rebuilder unit 130 determines what stage the transactionreached before the disconnection. In an embodiment, rebuilder unit 130determines what stage the transaction reached before the disconnectionby issuing read requests for an item in the transaction. In one example,rebuilder unit 130 may read that the transaction was successful andfinalized and no further action is needed. In another example, rebuilderunit 130 may read that the transaction was successful, a write thresholdof DS units accepted the proposal, but the transaction has not beenfinalized. In yet another example, rebuilder unit 130 may read that thetransaction is not yet successful, but at least a threshold of DS unitshave received the proposal. In yet another example, rebuilder unit 130may read that the transaction is not yet successful and the originaldata cannot be reconstructed.

In step 225, rebuilder unit 130 cleans up the partially completedtransaction based on what stage the transaction reached before thedisconnection. Rebuilder unit 130 cleans up the partially completedtransaction while preserving the consensus protocol goals of consistencyand safety.

In an embodiment, if rebuilder unit 130 determines the transaction wassuccessful but has not been finalized, rebuilder unit 130 issuesfinalize messages marking the transaction successful for future readersand cleanup messages to remove previous revisions of the data and otherfailed transactions that competed with proposer's successfultransaction.

In another embodiment, if rebuilder unit 130 determines the transactionis not yet successful but at least a threshold of DS units have receivedthe proposal, rebuilder unit 130 uses the threshold pieces toreconstruct the original data and completes the original proposal byproposing the same proposal on the DS units that need to accept it toreach the write threshold.

In yet another embodiment, if rebuilder unit 130 determines thetransaction is not yet successful and the original data cannot bereconstructed because a threshold of DS units have not received theproposal, rebuilder unit 130 competes with the crashed proposer.Rebuilder unit 130 competes with the crashed proposer by attempting tocomplete a transaction of its own that keeps the same data as in thecrashed proposer's partially completed transaction and merely updates aversion number, or revision, of the data.

In another embodiment, if rebuilder unit 130 determines the proposercrashed before achieving a write threshold but after achieving athreshold, rebuilder unit 130 competes with the crashed proposer byattempting to complete a transaction of its own that keeps the same dataas in the crashed proposer's partially completed transaction and merelyupdates a version number, or revision, of the data.

In this way, rebuilder unit 130 cleans up a partially completedtransaction caused by a proposer crashing before completing thetransaction when fast consensus protocols are in place. Consensusprotocols guarantee consistency and safety, but consensus protocols arenot concerned by leaving partially completed transactions on the DSNmemory, which unnecessarily consumes space resources.

FIG. 3 is a block diagram depicting components of a computer 300, suchas computing devices 110 i through 110 n from FIG. 1. FIG. 3 displaysthe computer 300, the one or more processor(s) 304 (including one ormore computer processors), the communications fabric 302, the memory306, the cache 316, the persistent storage 308, the communications unit310, the I/O interfaces 312, the display 320, and the external devices318. It should be appreciated that FIG. 3 provides only an illustrationof one embodiment and does not imply any limitations with regard to theenvironments in which different embodiments may be implemented. Manymodifications to the depicted environment may be made.

As depicted, the computer 300 operates over a communications fabric 302,which provides communications between the cache 316, the computerprocessor(s) 304, the memory 306, the persistent storage 308, thecommunications unit 310, and the input/output (I/O) interface(s) 312.The communications fabric 302 may be implemented with any architecturesuitable for passing data and/or control information between theprocessors 304 (e.g., microprocessors, communications processors, andnetwork processors, etc.), the memory 306, the external devices 318, andany other hardware components within a system. For example, thecommunications fabric 302 may be implemented with one or more buses or acrossbar switch.

The memory 306 and persistent storage 308 are computer readable storagemedia. In the depicted embodiment, the memory 306 includes a randomaccess memory (RAM). In general, the memory 306 may include any suitablevolatile or non-volatile implementations of one or more computerreadable storage media. The cache 316 is a fast memory that enhances theperformance of computer processor(s) 304 by holding recently accesseddata, and data near accessed data, from memory 306.

Program instructions for programs may be stored in the persistentstorage 308 or in memory 306, or more generally, any computer readablestorage media, for execution by one or more of the respective computerprocessors 304 via the cache 316. The persistent storage 308 may includea magnetic hard disk drive. Alternatively, or in addition to a magnetichard disk drive, the persistent storage 308 may include, a solid statehard disk drive, a semiconductor storage device, read-only memory (ROM),electronically erasable programmable read-only memory (EEPROM), flashmemory, or any other computer readable storage media that is capable ofstoring program instructions or digital information.

The media used by the persistent storage 308 may also be removable. Forexample, a removable hard drive may be used for persistent storage 308.Other examples include optical and magnetic disks, thumb drives, andsmart cards that are inserted into a drive for transfer onto anothercomputer readable storage medium that is also part of the persistentstorage 308.

The communications unit 310, in these examples, provides forcommunications with other data processing systems or devices. In theseexamples, the communications unit 310 may include one or more networkinterface cards. The communications unit 310 may provide communicationsthrough the use of either or both physical and wireless communicationslinks. Any programs may be downloaded to the persistent storage 308through the communications unit 310. In the context of some embodimentsof the present invention, the source of the various input data may bephysically remote to the computer 300 such that the input data may bereceived and the output similarly transmitted via the communicationsunit 310.

The I/O interface(s) 312 allows for input and output of data with otherdevices that may operate in conjunction with the computer 300. Forexample, the I/O interface 312 may provide a connection to the externaldevices 318, which may include a keyboard, keypad, a touch screen,and/or some other suitable input devices. External devices 318 may alsoinclude portable computer readable storage media, for example, thumbdrives, portable optical or magnetic disks, and memory cards. Softwareand data used to practice embodiments of the present invention may bestored on such portable computer readable storage media and may beloaded onto the persistent storage 308 via the I/O interface(s) 312. TheI/O interface(s) 312 may similarly connect to a display 320. The display320 provides a mechanism to display data to a user and may be, forexample, a computer monitor.

The programs described herein are identified based upon the applicationfor which they are implemented in a specific embodiment of theinvention. However, it should be appreciated that any particular programnomenclature herein is used merely for convenience, and thus theinvention should not be limited to use solely in any specificapplication identified and/or implied by such nomenclature.

The present invention may be a system, a method, and/or a computerprogram product at any possible technical detail level of integration.The computer program product may include a computer readable storagemedium (or media) having computer readable program instructions thereonfor causing a processor to carry out aspects of the present invention.

The computer readable storage medium can be a tangible device that canretain and store instructions for use by an instruction executiondevice. The computer readable storage medium may be, for example, but isnot limited to, an electronic storage device, a magnetic storage device,an optical storage device, an electromagnetic storage device, asemiconductor storage device, or any suitable combination of theforegoing. A non-exhaustive list of more specific examples of thecomputer readable storage medium includes the following: a portablecomputer diskette, a hard disk, a random access memory (RAM), aread-only memory (ROM), an erasable programmable read-only memory (EPROMor Flash memory), a static random access memory (SRAM), a portablecompact disc read-only memory (CD-ROM), a digital versatile disk (DVD),a memory stick, a floppy disk, a mechanically encoded device such aspunch-cards or raised structures in a groove having instructionsrecorded thereon, and any suitable combination of the foregoing. Acomputer readable storage medium, as used herein, is not to be construedas being transitory signals per se, such as radio waves or other freelypropagating electromagnetic waves, electromagnetic waves propagatingthrough a waveguide or other transmission media (e.g., light pulsespassing through a fiber-optic cable), or electrical signals transmittedthrough a wire.

Computer readable program instructions described herein can bedownloaded to respective computing/processing devices from a computerreadable storage medium or to an external computer or external storagedevice via a network, for example, the Internet, a local area network, awide area network and/or a wireless network. The network may comprisecopper transmission cables, optical transmission fibers, wirelesstransmission, routers, firewalls, switches, gateway computers and/oredge servers. A network adapter card or network interface in eachcomputing/processing device receives computer readable programinstructions from the network and forwards the computer readable programinstructions for storage in a computer readable storage medium withinthe respective computing/processing device.

Computer readable program instructions for carrying out operations ofthe present invention may be assembler instructions,instruction-set-architecture (ISA) instructions, machine instructions,machine dependent instructions, microcode, firmware instructions,state-setting data, configuration data for integrated circuitry, oreither source code or object code written in any combination of one ormore programming languages, including an object oriented programminglanguage such as Smalltalk, C++, or the like, and procedural programminglanguages, such as the “C” programming language or similar programminglanguages. The computer readable program instructions may executeentirely on the user's computer, partly on the user's computer, as astand-alone software package, partly on the user's computer and partlyon a remote computer or entirely on the remote computer or server. Inthe latter scenario, the remote computer may be connected to the user'scomputer through any type of network, including a local area network(LAN), a wide area network (WAN), a mobile broadband network, such as a4G and Long Term Evolution (LTE), or the connection may be made to anexternal computer (for example, through the Internet using an InternetService Provider). In some embodiments, electronic circuitry including,for example, programmable logic circuitry, field-programmable gatearrays (FPGA), or programmable logic arrays (PLA) may execute thecomputer readable program instructions by utilizing state information ofthe computer readable program instructions to personalize the electroniccircuitry, in order to perform aspects of the present invention.

Aspects of the present invention are described herein with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems), and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer readable program instructions.

These computer readable program instructions may be provided to aprocessor of a general purpose computer, special purpose computer, orother programmable data processing apparatus to produce a machine, suchthat the instructions, which execute via the processor of the computeror other programmable data processing apparatus, create means forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks. These computer readable program instructionsmay also be stored in a computer readable storage medium that can directa computer, a programmable data processing apparatus, and/or otherdevices to function in a particular manner, such that the computerreadable storage medium having instructions stored therein comprises anarticle of manufacture including instructions which implement aspects ofthe function/act specified in the flowchart and/or block diagram blockor blocks.

The computer readable program instructions may also be loaded onto acomputer, other programmable data processing apparatus, or other deviceto cause a series of operational steps to be performed on the computer,other programmable apparatus or other device to produce a computerimplemented process, such that the instructions which execute on thecomputer, other programmable apparatus, or other device implement thefunctions/acts specified in the flowchart and/or block diagram block orblocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods, and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof instructions, which comprises one or more executable instructions forimplementing the specified logical function(s). In some alternativeimplementations, the functions noted in the blocks may occur out of theorder noted in the Figures. For example, two blocks shown in successionmay, in fact, be executed substantially concurrently, or the blocks maysometimes be executed in the reverse order, depending upon thefunctionality involved. It will also be noted that each block of theblock diagrams and/or flowchart illustration, and combinations of blocksin the block diagrams and/or flowchart illustration, can be implementedby special purpose hardware-based systems that perform the specifiedfunctions or acts or carry out combinations of special purpose hardwareand computer instructions.

What is claimed is:
 1. A method comprising: detecting, by one or moreprocessors, a transmission control protocol disconnection of a firstdistributed storage unit from a distributed storage network, wherein thedistributed storage network comprises a set of distributed storageunits; identifying, by one or more processors, a transaction, wherein:the transaction is not in a final state, the transaction is a firstproposal, from the first distributed storage unit, for the set ofdistributed storage units to store a dataset with a first revisionnumber within the distributed storage network, and the dataset is brokeninto one or more data pieces to be written on the set of distributedstorage units of the distributed storage network that approve theproposal; identifying, by one or more processors, a timestamp of thetransaction; determining, by one or more processors, that the timestampof the transaction has surpassed a predefined threshold; sending, by oneor more processors, a read request for the transaction; determining, byone or more processors, a stage the transaction has reached before thefirst distributed storage unit disconnected; and responsive toreceiving, by one or more processors, a read request return that a writethreshold of the set of distributed storage units of the distributedstorage network have approved the first proposal and the transaction hasnot been finalized, issuing, by one or more processors, finalizemessages, marking the transaction successful, and cleanup messages, toremove previous revisions of data and failed proposals that competedwith the first proposal; responsive to receiving, by one or moreprocessors, a read request return that a write threshold of distributedstorage units of the distributed storage network have not approved thefirst proposal and a threshold of the distributed storage units of thedistributed storage network have approved the first proposal, placing,by one or more processors, the transaction in a final state, whereinplacing the transaction in a final state comprises: reconstructing, byone or more processors, data of the transaction using data pieces fromthe distributed storage units that received the first proposal, andproposing, by one or more processors, a second proposal that has thedataset with a second revision number to a subset of distributed storageunits of the set of distributed storage units that need to approve thefirst proposal to reach the write threshold; and responsive toreceiving, by one or more processors, a read request return that athreshold of distributed storage units of the distributed storagenetwork have not approved the first proposal, placing, by one or moreprocessors, the transaction in a final state, wherein placing thetransaction in a final state comprises: reconstructing, by one or moreprocessors, data of the transaction using data pieces from thedistributed storage units that received the first proposal, andproposing, by one or more processors, a second proposal that has thedataset with a second revision number to compete with the firstproposal.